8 research outputs found
Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization
A deeper understanding of video activities extends beyond recognition of
underlying concepts such as actions and objects: constructing deep semantic
representations requires reasoning about the semantic relationships among these
concepts, often beyond what is directly observed in the data. To this end, we
propose an energy minimization framework that leverages large-scale commonsense
knowledge bases, such as ConceptNet, to provide contextual cues to establish
semantic relationships among entities directly hypothesized from video signal.
We mathematically express this using the language of Grenander's canonical
pattern generator theory. We show that the use of prior encoded commonsense
knowledge alleviate the need for large annotated training datasets and help
tackle imbalance in training through prior knowledge. Using three different
publicly available datasets - Charades, Microsoft Visual Description Corpus and
Breakfast Actions datasets, we show that the proposed model can generate video
interpretations whose quality is better than those reported by state-of-the-art
approaches, which have substantial training needs. Through extensive
experiments, we show that the use of commonsense knowledge from ConceptNet
allows the proposed approach to handle various challenges such as training data
imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201
Shape-Graph Matching Network (SGM-net): Registration for Statistical Shape Analysis
This paper focuses on the statistical analysis of shapes of data objects
called shape graphs, a set of nodes connected by articulated curves with
arbitrary shapes. A critical need here is a constrained registration of points
(nodes to nodes, edges to edges) across objects. This, in turn, requires
optimization over the permutation group, made challenging by differences in
nodes (in terms of numbers, locations) and edges (in terms of shapes,
placements, and sizes) across objects. This paper tackles this registration
problem using a novel neural-network architecture and involves an unsupervised
loss function developed using the elastic shape metric for curves. This
architecture results in (1) state-of-the-art matching performance and (2) an
order of magnitude reduction in the computational cost relative to baseline
approaches. We demonstrate the effectiveness of the proposed approach using
both simulated data and real-world 2D and 3D shape graphs. Code and data will
be made publicly available after review to foster research